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Abstract. Using vision for navigation is important for many animals and a common debate is the 
extent to which spatial performance can be explained by "simple" view-based matching strategies. 
We discuss, in the context of recent work, how confusion between image-matching algorithms and 
the broader class of view-based navigation strategies, is hindering the debate around the use of 
vision in spatial cognition. A proper consideration of view-based matching strategies requires an 
understanding of the visual information available to a given animal within a particular experiment. 
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1 How might animals use vision for navigation? 

The navigation of many animals relies on vision through learning about the appearance of the world 
from important locations. One interesting question concerns the processing and computation needed to 
go from visual input to navigational memory and then to behaviour. One possibility is that vision can 
be used in a rather direct way {sensu J. J. Gibson, 1979 ). For example, in a complex world, two photo- 
graphs taken with the same camera can only be identical when the camera location and orientation are 
matched. This is also true for natural visual systems. Thus, if the view from a location is memorized, it 
can be used directly, by simple matching with the currently perceived view, to recover both the original 
location and orientation (Zeil, Hofmann, & Chahl, 2003 ). An alternative, indirect method, would be to 
process and interpret the egocentric visual input in order to construct a higher-order representation of 
space with a different coordinate frame, such as an environmentally referenced or allocentric cogni- 
tive map. The spatial computations that produce navigational behaviour could then be performed on 
the new construct. 

An emerging debate that pits direct and indirect ideas against each other concerns whether animals 
possess and use a geometric module to represent the shape of environments (Cheng, 2008 ). Vertebrates 
have been assumed to functionally extract the geometrical layout of an environment for reorientation. 
The original result that inspired this idea came from rats rewarded in one corner of a rectangular arena 
(Cheng, 1986 ). Rats often made errors by confusing the rewarded corner and its geometrical equiva- 
lent (i.e., the diagonally opposite corner, which shares the same position relative to the rectangular 
shape of the arena), even when each corner is marked by a distinct visual feature. This suggested that 
the geometry of the arena was constructed and represented independent of the features that compose it. 

The alternative explanation involves simply storing raw views that are associated with the goal 
corner. It has been shown that the shape of such arenas is implicitly contained in panoramic views 
(Stiirzl, Cheung, Cheng, & Zeil, 2008 ) and that simple view-based matching strategies could explain 
many experimental results (Cheung, Stiirzl, Zeil, & Cheng, 2008 ). The analysis involved in these pa- 
pers (Cheung et al., 2008 ; Stiirzl et al., 2008 ) used a virtual reality model of experimental arenas, so 
that animal's perspective views could be generated. Views from across the entire arena were compared 
with a goal view from a position near the target corner. The comparison is performed by summing the 
intensity differences between location-matched pixels across the two views. Simple methods of this 
type are often called image-matching strategies, as views are represented by images. We discuss here, 
in the context of a recent paper, how confusion between simple pixel- wise image-matching algorithms 
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and the broader class of view-based navigation strategies is hindering the debate around the use of 
vision in spatial cognition. 

2 The difference between image matching and view-based matching 

Across a series of papers, Lee and colleagues have tried to pick apart the use of geometry and im- 
age matching in reorientation tasks for infants (Lee & Spelke, 2011 ) and chicks (Lee, Spelke, & 
Vallortigara, 2012 ). In a visual working memory task, individuals are shown a rewarded corner in a 
rectangular array and then disoriented. After their release, it is recorded whether subjects confuse the 
correct corner with its diagonally opposite corner (i.e., geometrical success) or with all corners (i.e., 
geometrical failure). Interestingly, chicks, like young children, "succeeded" when the surrounding 
rectangular shape was defined by a subtle three-dimensional (3D) perturbation of the floor but "failed" 
when the rectangular shape was defined by salient high-contrast 2D cues, such as a coloured surface 
on the floor or conspicuous columns at the corners of the rectangle. As claimed by the authors, this in- 
deed goes directly against the prediction of image matching, because a raw panoramic image oriented 
toward the rewarded corner will generate a good match when facing the diagonally opposite corner in 
both conditions with salient 2D cues. However, we would like to emphasize why those results — that 
do refute image matching — do not similarly refute view-based matching. 

View-based matching refers not to the 2D or 3D nature of the cues used but to the fact that views 
are stored and matched in an egocentric frame of reference. A key question, for any given experimen- 
tal subject, is to ask what information would be present in such an egocentric view. That is, we need 
to understand an animal's umwelt or self-world (von Uexkiill, 1957 ). Walking insects appear to use 
mostly 2D cues, hence the relevance of using 2D images to model their views. But flying insects and 
vertebrates can generate effective depth information from parallax and/or binocularity. These depth 
cues can also be incorporated into view-based models of reorientation (bees: Dittmar, Stiirzl, Baird, 
Boeddeker, & Egelhaaf, 2010 ; humans: Pickup, Fitzgibbon, Gilson, & Glennerster, 2011 ). View-based 
models should consider visual properties such as colour, regional acuity variations, binocularity, as 
well as the influence of active sensing on the information perceived (e.g., self-generated parallax). 
That is, we have to remember that navigating animals are embodied cognitive systems (Clark, 1997 ) 
with particular sensors and particular ways of moving in the world. 

For example, to fully understand the results in Lee et al. ( 2012 ), we need to understand the chick 
visual system, two aspects of which may play a role in explaining Lee et al.'s pattern of results, 
(i) With the flat contrasted rectangular shape on the floor, perhaps chicks did not spontaneously dis- 
criminate between correct and incorrect corners because, due to their fovea, high-contrast features 
may have increased salience in the frontal visual field. Thus, chicks' perspective views facing all four 
corners will be more similar to each other than raw images would be. (ii) Chicks may have been able to 
extract shape cues from horizontal walls rather than vertical columns because depth information was 
generated by vertical head-bobbing rather than horizontal swaying. The results of Lee et al. are sugges- 
tive about of the nature of the cues used by chicks for reorientation. However, further knowledge of the 
chicks' visual system (including any active components) is required before we can address questions 
about the potential of view-based matching. 

Within this and other experimental paradigms, the research program that might lead to a full 
evaluation of view-based matching would involve a systematic investigation of an animal's visual 
system and ability to discriminate different cues. This knowledge would allow the design of experi- 
ments where simple manipulations of the environment or the subject's starting position will lead to 
predictions about different paths being taken if the subject is using a view-based matching strategy 
(e.g., Wystrach, Cheng, Sosa, & Beugnon, 2011 ). In contrast, such manipulations should not alter the 
straightness of the path of a subject using higher-order representations, enabling us to distinguish be- 
tween the two hypotheses. Similarly, forcing the subjects to perceive a scene from different directions 
during training should affect view-based matching and not allocentric navigation. With such an ap- 
proach, Pecchia and Vallortigara ( 2010 ) and Pecchia, Gagliardo, and Vallortigara ( 2011 ) demonstrated 
the use of a view-based matching strategy for certain tasks in chicks and pigeons. 

3 Conclusion 

Because of the parsimony of the idea, the use of views for navigation is often thought of as an insect 
solution. However, view-based matching is a useful strategy for any navigator (Wystrach & Graham, 
2012 ). For animals with any type of visual system, view-based matching is an inexpensive process 
because information is perceived, stored, and used in the same egocentric frame of reference. The 
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agent is therefore freed of any computations required to move information between different coor- 
dinate schemes (i.e., from egocentric to allocentric for storage and from allocentric to egocentric for 
action). This is true for navigation but does not necessarily mean that view-based matching is good 
for other tasks. Object recognition, for instance, needs to be viewpoint invariant and therefore the 
demands of the task may have driven different perceptual systems (Biederman & Gerhardstein, 1995 ). 
Alternatively, object recognition may depend on the integrated use of multiple egocentric views (Tarr 
& Biilthoff, 1998 ). 

We have explained here that refuting 2D image matching and emphasizing the use of 3D cues in 
visuospatial tasks (Lee et al., 2012 ) is interesting as it provides insight into which cues are extracted 
by the visual system of a given species. However, this approach does not fully test for view-based 
matching. View-based matching refers not to which cues are used but how those cues are used. We 
hope this paper will help future studies to clearly disentangle between the nature of visual cues used 
by an animal and how they are processed and used: where view-based matching is often an alternative 
hypothesis to the use of allocentric representations. 
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